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Abstract: 

Manifold reconstruction has been extensively studied among the computational geometry com- 
munity for the last decade or so, especially in two and three dimensions. Recently, significant 
improvements were made in higher dimensions, leading to new methods to reconstruct large classes 
of compact subsets of Euclidean space W^. However, the complexities of these methods scale up ex- 
ponentially with d, which makes them impractical in medium or high dimensions, even for handling 
low-dimensional submanifolds. 

In this paper, we introduce a novel approach that stands in-between reconstruction and topo- 
logical estimation, and whose complexity scales up with the intrinsic dimension of the data. Our 
algorithm combines two paradigms: greedy refinement, and topological persistence. Specifically, 
given a point cloud in M*^, the algorithm builds a set of landmarks iteratively, while maintaining 
nested pairs of complexes, whose images in M*^ lie close to the data, and whose persistent homology 
eventually coincides with the one of the underlying shape. When the data points are sufficiently 
densely sampled from a smooth m-submanifold of W^, our method retrieves the homology of the 
submanifold in time at most c(m)n^, where n is the size of the input and c(m) is a constant depend- 
ing solely on m. It can also provably well handle a wide range of compact subsets of W^, though 
with worse complexities. 

Along the way to proving the correctness of our algorithm, we obtain new results on Cech, Rips, 
and witness complex filtrations in Euclidean spaces. Specifically, we show how previous results on 
unions of balls can be transposed to Cech filtrations. Moreover, we propose a simple framework for 
studying the properties of filtrations that are intertwined with the Cech filtration, among which are 
the Rips and witness complex filtrations. Finally, we investigate further on witness complexes and 
quantify a conjecture of Carlsson and de Silva, which states that witness complex filtrations should 
have cleaner persistence barcodes than Cech or Rips filtrations, at least on smooth submanifolds 
of Euclidean spaces. 

Key-words: Reconstruction, Persistent Homology, Filtration, Cech complex. Rips complex. 
Witness complex. Topological estimation 
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Vers une reconstruction basee sur la persistance dans les espaces 

euclidiens 



Resume : La reconstruction de varietes a ete fortement etudiee durant cette derniere decennie, en 
particulier dans le cas des petites dimensions. Des avancees recentes dans le cas des plus grandes 
dimensions ont permis I'emergence de nouvelles methodes de reconstruction qui peuvent traiter 
des nuages de points issus de sous-varietes lisses de de dimensions arbitraires. Toutefois, la 
complexite de ces approches croit exponentiellement avec la dimension d de I'espace ambiant, ce 
qui les rend impraticables en dimensions moyennes ou grandes, meme pour reconstruire des sous- 
varietes de petite dimension telles que des courbes ou des surfaces. 

Dans cet article, nous introduison une nouvelle approche qui se situe a la frontiere entre la 
reconstruction classique et I'inference topologique, et dont la complexite croit avec la dimension 
intrinseque des donnees. Notre algorithme combine deux paradigmes : le raffinement glouton type 
maxmin et la persistence topologique. Plus precisement, etant donne un nuage de points dans M'^, 
I'algorithme construit un sous-ensemble de landmarks iterativement, tout en maintenant une paire 
de complexes simpliciaux imbriques, dont les images dans sont proches des donnees, et dont 
I'homologie persistante coincide avec I'homologie de I'espace sous-jacent aux donnees. Quand le 
nuage de point est suffisamment densement echantillonne a partir d'une sous-variete lisse de M'^, 
notre methode retrouve I'homologie de la variete en temps c{m)n^, oil n est la taille de I'entree et 
c(m) est une constante dependant uniquement de la dimension intrinseque m de la variete. Notre 
approche peut aussi reconstruire avec garanties une large classe d'objets compacts dans W^, avec 
de moins bons temps de calcul toutefois. 

Afin de donner des garanties theoriques a notre algorithme, nous etudions les filtrations de 
Cech, de Rips, et de complexes de temoins dans M'^, pour lesquels nous presentons un ensemble 
de resultats nouveaux. Plus precisement, nous montrons comment des resultats existants sur les 
unions de boules peuvent etre transferes aux filtrations de Cech, puis de la aux filtrations de Rips et 
de complexes de temoins. Nous proposons egalement une premiere quantification d'une conjecture 
de Carlsson et de Silva, selon laquelle les filtrations de complexes de temoins fournissent de meilleurs 
resultats que les filtrations de Cech et de Rips dans le cadre de I'inference topologique, en tout cas 
pour le cas des sous-varietes lisses de M'^. 

Mots-cles : Reconstruction, Homologie persistante. Filtration, Complexe de Cech, Complexe de 
Rips, Complex de temoins. Inference topologique 
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1 Introduction 

The problem of reconstructing unknown structures from finite collections of data samples is ubiqui- 
tous in the Sciences, where it has many different variants, depending on the nature of the data and 
on the targeted application. In the last decade or so, the computational geometry community has 
gained a lot of interest in manifold reconstruction, where the goal is to reconstruct submanifolds of 
Euclidean spaces from point clouds. In particular, efficient solutions have been proposed in dimen- 
sions two and three, based on the use of the Delaunay triangulation - see |8] for a survey. In these 
methods, the unknown manifold is approximated by a simplicial complex that is extracted from the 
full-dimensional Delaunay triangulation of the input point cloud. The success of this approach is 
explained by the fact that, not only does it behave well on practical examples, but the quality of its 
output is guaranteed by a sound theoretical framework. Indeed, the extracted complex is usually 
shown to be equal, or at least close, to the so-called restricted Delaunay triangulation, a particular 
subset of the Delaunay triangulation whose approximation power is well-understood on smooth 
or Lipschitz curves and surfaces [B [21 [6]. Unfortunately, the size of the Delaunay triangulation 
grows too fast with the dimension of the ambient space for the approach to be still tractable in 
high-dimensional spaces [33] . 

Recently, significant steps were made towards a full understanding of the potential and lim- 
itations of the restricted Delaunay triangulation on smooth manifolds [141 135] . In parallel, new 
sampling theories were developped, such as the critical point theory for distance functions [9], 
which provides sufficient conditions for the topology of a shape X (Z to be captured by the 
offsets of a point cloud L lying at small Hausdorff distance. These advances lay the foundations 
of a new theoretical framework for the reconstruction of smooth submanifolds [11^ [M] , and more 
generally of large classes of compact subsets of [H [TOl 112] . Combined with the introduction 
of more lightweight data structures, such as the witness complex [E], they have lead to new re- 
construction techniques in arbitrary Euclidean spaces @|, whose outputs can be guaranteed under 
mild sampling conditions, and whose complexities can be orders of magnitude below the one of the 
classical Delaunay-based approach. For instance, on a data set with n points in M"^, the algorithm 
of [4] runs in time whereas the size of the Delaunay triangulation can be of the order of 

n Unfortunately, 2<^('^')n2 still remains too large for these new methods to be practical, even 
when the data points lie on or near a very low-dimensional submanifold. 

A weaker yet similarly difficult version of the reconstruction paradigm is topological estimation, 
where the goal is not to exhibit a data structure that faithfully approximates the underlying shape 
X, but simply to infer the topological invariants of X from an input point cloud L. This problem 
has received a lot of attention in the recent years, and it finds applications in a number of areas 
of Science, such as sensor networks [19| . statistical analysis [7], or dynamical systems \^2\ I36j. A 
classical approach to learning the homology of X consists in building a nested sequence of spaces 
KP ^ /C^ C • • • C /C™, and in studying the persistence of homology classes throughout this sequence. 
In particular, it has been independently proved in [12] and [15] that the persistent homology of the 
sequence defined by the a-offsets of a point cloud L coincides with the homology of the underlying 
shape X, under sampling conditions that are milder than the ones of [9]. Specifically, if the 
Hausdorff distance between L and X is less than e, for some small enough e, then, for all a > e, the 
canonical inclusion map ^ L""*"^^ induces homomorphisms between homology groups, whose 
images are isomorphic to the homology groups of X. Combined with the structure theorem of [38] . 
which states that the persistent homology of the sequence {L"}a>o is fully described by a finite set 
of intervals, called a persistence barcode or a persistence diagram — see Figure [U (left), the above 
result means that the homology of X can be deduced from this barcode, simply by removing the 
intervals of length less than 2e, which are therefore viewed as topological noise. 
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From an algorithmic point of view, the persistent homology of a nested sequence of simplicial 
complexes (called a filtration) can be efficiently computed using the persistence algorithm [221 138j . 
Among the many filtrations that can be built on top of a point set L, the a-shape enables to reliably 
recover the homology of the underlying space X, since it is known to be a deformation retract of 

|21] . However, this property is useless in high dimensions, since computing the a-shape requires 
to build the full-dimensional Delaunay triangulation. It is therefore appealing to consider other 
filtrations that are easy to compute in arbitrary dimensions, such as the Rips and witness complex 
filtrations. Nevertheless, to the best of our knowledge, there currently exists no equivalent of the 
result of |12l 115] for such filtrations. In this paper, we produce such a result, not only for Rips and 
witness complexes, but more generally for any filtration that is intertwined with the Cech filtration. 
Recall that, for all a > 0, the Cech complex C"{L) is the nerve of the union of the open balls of 
same radius a about the points of L, i.e. the nerve of L". It follows from the nerve theorem \31\ 
Cor. 4G.3] that C°^{L) and L" are homotopy equivalent. However, despite the result of [T2l [T5]. 
this is not sufficient to prove that the persistent homology of C"(L) ^ C'^"'"^^(L) coincides with 
the homology of X, mainly because it is not clear whether the homotopy equivalences C"{L) — > L° 
and C"+2^(L) L"+2e provided by the nerve theorem commute with the canonical inclusions 
C°'{L) C"~^^^(L) and L"^'^^. Using standard arguments of algebraic topology, we prove 

that there exist some homotopy equivalences that do commute with the canonical inclusions, at 
least at homology and homotopy levels. This enables us to extend the result of [12^1151 to the Cech 
filtration, and from there to the Rips and witness complex filtrations. 




Figure 1: Results obtained from a set W of 10,000 points sampled uniformly at random from a 



Left: persistence barcode of the Rips filtration, built over a set of 900 carefully-chosen landmarks. 
Right: result of our algorithm, applied blindly to the input W. Both methods highlight the two 
underlying structures: curve and torus. 

Another common concern in topological data analysis is the size of the vertex set on top of which 
a filtration is built. In many practical situations indeed, the point cloud W given as input samples 
the underlying shape very finely. In such situations, it makes sense to build the filtration on top 
of a small subset L of landmarks, to avoid a waste of computational resources. However, building 
a filtration on top of the sparse landmark set L instead of the dense point cloud W can result in 
a significant degradation in the quality of the persistence barcode. This is true in particular with 
the Cech and Rips filtrations, whose barcodes can have topological noise of amplitude depending 
directly on the density of L. The introduction of the witness complex filtration appeared as an 
elengant way of solving this issue [18]. The witness complex of L relative to W, or Cw{L) for 



helical curve drawn on the 2d torus 
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short, can be viewed as a relaxed version of the Delaunay triangulation of L, in which the points of 
W\L are used to drive the construction of the complex [16]. Due to its special nature, which takes 
advantage of the points olW\L, and due to its close relationship with the restricted Delaunay 
triangulation, the witness complex filtration is likely to give persistence barcodes whose topological 
noise depends on the density of W rather than on the one of L, as conjectured in [18]. We prove 
in the paper that this statement is only true to some extent, namely: whenever the points of W 
are sufficiently densely sampled from some smooth submanifold of M'^, the topological noise in the 
barcode can be arbitrarily small compared to the density of L. Nevertheless, it cannot depend 
solely on the density of W . This shows that the witness complex filtration does provide cleaner 
persistence barcodes than Cech or Rips filtrations, but maybe not as clean as expected. 

Taking advantage of the above theoretical results on Rips and witness complexes, we propose 
a novel approach to reconstruction that stands somewhere in-between the classical reconstruction 
and topological estimation paradigms. Our algorithm is a variant of the method of [U [30] that 
combines greedy refinement and topological persistence. Specifically, given an input point cloud 
W , the algorithm builds a subset L of landmarks iteratively, and in the meantime it maintains a 
nested pair of simplicial complexes (which happen to be Rips or witness complexes) and computes its 
persistent Betti numbers. The outcome of the algorithm is the sequence of nested pairs maintained 
throughout the process, or rather the diagram of evolution of their persistent Betti numbers. Using 
this diagram, a user or software agent can determine a relevant scale at which to process the data. 
It is then easy to rebuild the corresponding set of landmarks, as well as its nested pair of complexes. 
Note that our method does not completely solve the classical reconstruction problem, since it does 
not exhibit an embedded complex that is close to X topologically and geometrically. Nevertheless, 
it comes with theoretical guarantees, it is easily implementable, and above all it has reasonable 
complexity. Indeed, in the case where the input point cloud is sampled from a smooth submanifold 
X of M'^, we show that the complexity of our algorithm is bounded by c{m)n^ , where c(m) is a 
quantity depending solely on the intrinsic dimension m of X, while n is the size of the input. To 
the best of our knowledge, this is the first provably-good topological estimation or reconstruction 
method whose complexity scales up with the intrinsic dimension of the manifold. In the case where 
X is a more general compact set in M^, our complexity bound becomes c{d)n^. 

The paper is organized as follows: after introducing the Cech, Rips, and witness complex 
filtrations in Section [2l we prove our structural results in Sections [3] and [H focusing on the general 
case of compact subsets of in Section [3l and more specifically on the case of smooth submanifolds 
of M'^ in Section HI Finally, we present our algorithm and its analysis in Section [3 

2 Various complexes and their relationships 

The definitions, results and proofs of this section hold in any arbitrary metric space. However, 
for the sake of consistency with the rest of the paper, we state them in the particular case of W^, 



tightest possible for the Euclidean case, but they are for the general metric case. Using specific 
properties of Euclidean spaces, it is indeed possible to work out somewhat tighter bounds, but at 
the price of a loss of simplicity in the statements. 

For any compact set X C M.'^, we call diam(X) the diameter of X, and diamcc(^) the 
component-wise diameter of X, defined by: diamccl-'^) = infj diam(Xj), where the Xi are the 
path-connected components of X. Finally, given two compact sets X,Y in W^, we call d-H{X,Y) 
their Hausdorff distance. 



endowed with the Euclidean norm 




As a consequence, our bounds are not the 
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Cech complex. Given a finite set L of points of M'^ and a positive number a, we call L"^ the 
union of the open balls of radius a centered at the points of L: = U^ei -^(^' '^)- This definition 
makes sense only for a > 0, since for a = we get L" = 0. We also denote by {L'^} the open 
cover of formed by the open balls of radius a centered at the points of L. The Cech complex 
of L of parameter a, or C°'{L) for short, is the nerve of this cover, i.e. it is the abstract simplicial 
complex whose vertex set is L, and such that, for all /c G N and all xq, • • • , € L, [xq, • • • , Xk] is 
a /c-simplex of C°'{L) if and only if -6(2:0, a) H • • • n B{xk, a) ^ 0. 

Rips complex. Given a finite set L C M'^ and a positive number a, the Rips complex of L of 
parameter q, or TZ"^{L) for short, is the abstract simplicial complex whose /c-simplices correspond 
to unordered {k + l)-tuples of points of L which are pairwise within Euclidean distance a of one 
another. The Rips complex is closely related to the Cech complex, as stated in the following 
standard lemma, whose proof is recalled for completeness: 

Lemma 2.1 For all finite set LcR'^ and all a > 0, we have: C 7^"(L) C C"(L). 

Proof. The proof is standard. Let [xq, - ■ ■ , x^] be an arbitrary fc-simplex of C2" (L). The Euclidean 
balls of same radius ^ centered at the Xi have a non-empty common intersection in MJ^. Let p be a 
point in the intersection. We then have: VO < i,j < k, \\xi — Xj\\ < \\xi — p\\ + \\p — Xj\\ < a. This 
implies that [xq, • • • , Xk] is a simplex of TZ"{L), which proves the first inclusion of the lemma. 

Let now [xq,--- ,Xk] be an arbitrary /c-simplex of 1Z°'{L). We have ||xo — Xi\\ < a for all 
i = 0, - ■ ■ ,k. This means that xq belongs to all the Euclidean balls B{xi, a), which therefore have 
a non-empty common intersection in W^. It follows that [xq, • • • ,Xfc] is a simplex of C°'{L), which 
proves the second inclusion of the lemma. □ 

Witness complex. Let L be a finite subset of M"^, referred to as the landmark set, and let W be 
another (possibly infinite) subset of M*^, identified as the witness set. Let also a S [0,oo). 

- Given a point w ^ W and a fe-simplex a with vertices in L, w is an a-witness of a (or, 
equivalently, w a-witnesses a) if the vertices of a lie within Euclidean distance {dk{w) + a) of 
w, where dk{w) denotes the Euclidean distance between w and its {k + l)th nearest landmark 
in the Euclidean metric. 

- The a-witness complex of L relative to W, or C^{L) for short, is the maximum abstract 
simplicial complex, with vertices in L, whose faces are a- witnessed by points of W . 

When a = 0, the a-witness complex coincides with the standard witness complex Cw{L), introduced 
in [T7j. The a-witness complex is also closely related to the Cech complex, though the relationship 
is a bit more subtle than in the case of the Rips complex: 

Lemma 2.2 Let L,W C be such that L is finite. If every point of L lies within Euclidean 

a — I 

distance I of W, then for all a > I we have: C^~{L) C Cy^^{L). In addition, if the Euclidean 
distance from any point of W to its second nearest neighbor in L is at most V , then for all a > 
we have: C^{L) C C2("+'')(L). 

Proof. Let [xq, • • • , Xk] be a /c-simplex of C~2~(L). This means that HiLo ^{xi, / 0, and as 
a result, that ||xo — Xj|| < a — I for all i = 0, • • • , fc. Let vu he a point of W closest to xq in the 
Euclidean metric. By the hypothesis of the lemma, we have — xo|| < therefore xq, • • • ,Xk lie 
within Euclidean distance a of w. Since the Euclidean distances from w to its nearest points of L 
are non-negative, vu is an a-witness of [xq, • • • , x^] and of all its faces. As a result, [xq, • • • , x^] is a 
simplex of C|y(L). 
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Consider now a /c-simplex [xq, - ■ ■ , Xk] of Cj^(L). If /c = 0, then the simplex is a vertex [xq], and 
therefore it belongs to (L) for all a' > 0. Assume now that k > 1. Edges [xojXi], • • • , [xc^fc] 
belong also to Cy^{L)^ hence they are a- witnessed by points of W . Let wi he an a- witness of 
[xo, Xi\. Distances \\wi — xq\\ and ||tUi — are bounded from above by d2{wi) + a, where d2(wj) is 
the Euclidean distance from wi to its second nearest point of L, which by assumption is at most 
It follows that ||a;o — Xi\\ < \\xq — Wi\\ + \\wi — Xi\\ < 2a + 21' . Since this is true for alH = 0, • • • , A;, 
we conclude that xq belongs to the intersection HiLo ^{^it ^(q^ + ^0)) which is therefore non-empty. 
As a result, [xq, • • • , Xk] is a simplex of C^'-""'"^ □ 

Corollary 2.3 Let X he a compact subset of W^, and let L <^ W CI be such that L is finite. 
Assume that d-}i{X,W) < 6 and that d>^(W,L) < e, with e + 6 < ^ diamcc(-'^)- Then, for all 
a> e, we have: C^{L) Q C^{L) C C'^'^+^<^^+^) (L) . In particular, if 5 < e < \ diamcc(^), then, 
for all a>2e we have: Ct(L) C C^{L) C C^°'{L). 

Proof. Since d'n{W,L) < e, every point of L lies within Euclidean distance e of W. As a result, 
the first inclusion of Lemma 12.21 holds with / = e, that is: C^^{L) C C^^{L). 

Now, for every point w G W, there is a point p £ L such that \\w — p\\ < e. More- 
over, there is a point x € X such that \\w — x\\ < 6, since we assumed that d-}-i{X,W) < 5. 
Let Xx be the path-connected component of X that contains x. Take an arbitrary value A € 
(O, i diamcc(^) — 2(e + (5)), and consider the open ball B{w,2{e + 6) + A). This ball clearly 
intersects Xx., since it contains x. Furthermore, Xx is not contained entirely in the ball, since 
otherwise we would have: diamcc(^) ^ diam(Xa;) < 4(e + 5) + 2A, hereby contradicting the fact 
that A < ^ diamccl-'^) — 2{e + 5). Hence, there is a point y € X lying on the bounding sphere of 
B{w, 2(e + 5) + A). Let g G L be closest to y. We have \\y — q\\ < e + 6, since our hypothesis implies 
that d-H{X, L) < d-n{X, W) + d'}i{W, L) < 6 + e. It follows then from the triangle inequality that 

— Q'll > ll''^ — ?/|| — — lly — (zll > 2(e + (5) + X — {e + 6) — {e + 5) = A > 0. Thus, q is different 

from p, and therefore the ball B{w,3{e + 5) + A) contains at least two points of L. Since this is 
true for arbitrarily small values of A, the Euclidean distance from w to its second nearest neighbor 
in L is at most 3(e + 5). It follows that the second inclusion of Lemma [2. 21 holds with I' = 3{£ + 6), 
that is: C^{L) C C2(°+3(s+5)) (L). □ 

As mentioned at the head of the section, slightly tighter bounds can be worked out using specific 
properties of Euclidean spaces. For the case of the Rips complex, this was done by de Silva and 
Christ [191 127j . Their approach can be combined with ours in the case of the witness complex. 

3 Structural properties of filtrations over compact subsets of 

Throughout this section, we use classical concepts of algebraic topology, such as homotopy equiv- 
alences, deformation retracts, or singular homology. We refer the reader to [3T] for a good intro- 
duction to these concepts. 

Civen a compact set X C M'^, we denote by dx the distance function defined by dx{x) = 
infjllx — y\\ : y G X}. Although dx is not differentiable, it is possible to define a notion of critical 
point for distance functions and we denote by wfs(X) the weak feature size of X, defined as the 
smallest positive critical value of the distance function to X [10]. We do not explicitly use the 
notion of critical value in the following, but only its relationship with the topology of the offsets 
X°^ = {x G M"^ : dx{x) < a}, stressed in the following result from [29| : 
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Lemma 3.1 (Isotopy Lemma) If < a < a' are such that there is no critical value of dx 
in the closed interval [a, a'], then X" and X'^ are homeomorphic (and even isotopic), and X" 
deformation retracts onto X". 

In particular the hypothesis of the lemma is satisfied when < cti < a2 < wfs(X). In other 
words, all the offsets of X have the same topology in the interval (0,wfs(X)). 

3.1 Results on homology 

We use singular homology with coefficients in an arbitrary field - omitted in our notations. In the 
following, we repeatedly make use of the following standard result of linear algebra: 

Lemma 3.2 (Sandwich Lemma) Consider the following sequence of homomorphisms between 
finite- dimensional vector spaces over a same field: A^B^C^D^E^F. Assume that 
rank {A ^ F) = rank (C D). Then, this quantity also equals the rank of B ^ E. In the same 
way, ifA^B^C^E^Fisa sequence of homomorphisms such that rank {A ^ F) = dim C , 
then rank (B ^ E) = dimC. 

Proof. Observe that, for any sequence of homomorphisms F ^ G H, we have rank {g o f) < 
minjrank /, rank g}. Applying this fact to maps A ^ F, B ^ E, and C ^ D, which are nested in 
the sequence of the lemma, we get: rank (A — > F) < rank (B ^ E) < rank (C — > D), which proves 
the first statement of the lemma. As for the second statement, it is obtained from the first one by 
letting D = C and taking C — > D to be the identity map. □ 

3.1.1 Cech filtration 

Since the Cech complex is the nerve of a union of balls, its topological invariants can be read from 
the structure of its dual union. It turns out that unions of balls have been extensively studied 
in the past [9l \T2\ I15j . Our analysis relies particularly on the following result, which is an easy 
extension of Theorem 4.7 of |12| : 

Lemma 3.3 Let X be a compact set and L a finite set in W^, such that d-)^{X,L) < e for some 
£ < \ wfs(X). Then, for all a, a' G [e,wfs(X) — e] such that a' —a > 2e, and for all \ G (0,wfs(X)), 
we have: \/k £ N, H^^X'^) = im z^,, where i^, : Hk{L'^) IIk{L"') is the homomorphism between 
homology groups induced by the canonical inclusion i : L" ^ . Given an arbitrary point xq G X , 
the same conclusion holds for homotopy groups with base-point xq. 

Proof. We can assume without loss of generality that e < a < a' — 2e < wfs(X) — 3e, since 
otherwise we can replace e by any e' G (dniX, L),e). From the hypothesis we deduce the following 
sequence of inclusions: 

By the Isotopy Lemma l3.H for all < /3 < /?' < wfs(X), the canonical inclusion X^ ^ X^' is a 
homotopy equivalence. As a consequence, Eq. ([1]) induces a sequence of homomorphisms between 
homology groups, such that all homomorphisms between homology groups of X'^~'^ , X°^~^^ , X"' 
are isomorphisms. It follows then from the Sandwich Lemma 13.21 that i* : IIj.{L"^) — > Hi^{L" ) has 
same rank as these isomorphisms. Now, this rank is equal to the dimension of Hk{X ), since the 
are homotopy equivalent to X^ for all < /3 < wfs(X). It follows that im i^, = dmiHf^{X^), since 
our ring of coefficients is a field. The case of homotopy groups is a little trickier, since replacing 
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homology groups by homotopy groups does not allow us to use the above rank argument. However, 
we can use the same proof as in Theorem 4.7 of [T^ to conclude. □ 

Observe that Lemma 13.31 does not guarantee the retrieval of the homology of X. Instead, it 
deals with sufficiently small offsets of X, which are homotopy equivalent to one another but possibly 
not to X itself. In the special case where X is a smooth submanifold of M'^ however, X^ and X 
are homotopy equivalent, and therefore the theorem guarantees the retrieval of the homology of 
X. From an algorithmic point of view, the main drawback of Lemma 13.31 is that computing 
the homology of a union of balls or the image of the homomorphism i^^ is usually awkward. As 
mentionned in [121 [T5] this can be done by computing the persistence of the a-shape or A-medial 
axis filtrations associated to L but there do not exist efficient algorithms to compute these filtrations 
in dimension more than 3. In the following we show that we can still reliably obtain the homology 
of X from easier to compute filtrations, namely the Rips and Witness complexes filtrations. 

Consider now the Cech complex for any value q > 0. By definition, C'^(L) is the nerve of 

the open cover {L"} of L". Since the elements of {L"} are open Euclidean balls, they are convex, 
and therefore their intersections are either empty or convex. It follows that {-L"} satisfies the 
hypotheses of the nerve theorem, which implies that C°'{L) and L°' are homotopy equivalent ~ see 
e.g. |31l Corollary 4G.3]. We thus get the following diagram, where horizontal arrows are canonical 
inclusions, and vertical arrows are homotopy equivalences provided by the nerve theorem: 

L° ^ L"' 

T T (2) 

C"(L) ^ C'iL) 

Determining whether this diagram commutes is not straightforward. The following result, based 
on standard arguments of algebraic topology, shows that there exist homotopy equivalences between 
the union of balls and the Cech complex that make the above diagram commutative at homology 
and homotopy levels: 

Lemma 3.4 Let L be a finite set of points in and let < a < ol . Then, there exist homotopy 
equivalences C°'{L) — > L" andC°^ [L) — > L" such that, for all A; G N, the diagram of Eq. ^ induces 
the following commutative diagrams: 

T t and t t 

Hk{C^{L)) ^ Hk{C^'{L)) TTkiC^iL)) ^ 7rfe(C"'(L)) 

where vertical arrows are isomorphisms. 

Proof. Our approach consists in a quick review of the proof of the nerve theorem provided in 
Section 4G of [31], and in a simple extension of the main arguments to our context. 

As mentioned earlier, the open cover {L°'} satisfies the conditions of the nerve theorem, namely: 
for all points xq, • • • G L, P|f^Qi?(x/,a) is either empty, or convex and therefore contractible. 
From this cover we construct a topological space AL" as follows: let A" denote the standard n- 
simplex, where n = — 1. To each non-empty subset S of L we associate the face [S] of A" 
spanned by the elements of S, as well as the space Bs{oi) = {^^^^ B{s.,a) C L". AL" is then the 
subspace of x A" defined by: 

AL"= U Bs{a)x[S\ 



RR n° 6391 



10 



Chazal & Oudot 



The space AL" is built similarly. The product structures of AL" and AL" imply the existence 
of canonical projections pa '■ AL° and Pa' ■ AL" . These projections commute with 

the canonical inclusions AL" ^ AL° and ^ L"' , which implies that the following diagram: 

L" ^ L"' 

Pa T T Pa' (3) 

AL" ^ AL"' 

induces commutative diagrams at homology and homotopy levels. Moreover, since {L"'} is an open 
cover of L", which is paracompact, pa is a homotopy equivalence |31l Prop. 4G.2]. The same holds 
for Pa', and therefore pa and pa' induce isomorphisms at homology and homotopy levels. 

We now show that, similarly, there exist homotopy equivalences AL" — > C"'{L) and AL° — > 
C° (L) that commute with the canonical inclusions AL" ^ AL° and C°'{L) ^ (L). This 
follows in fact from the proof of Corollary 4G.3 of [31]. Indeed, using the notion of complex of 
spaces introduced in |3H Section 4G], it can be shown that AL" is the realization of the complex of 
spaces associated with the cover {L"} — see the proof of |3H Prop. 4G.2]. Its base is the barycentric 
subdivision of C"(L), where each vertex corresponds to a non-empty finite intersection Bs{a) 
for some 5 C L, and where each edge connecting two vertices S C S' corresponds to the canonical 
inclusion Bs'{oi) ^ Bs{a). In the same way, AL" is the realization of a complex of spaces built 
over the barycentric subdivision F" of (L). Now, since the non-empty finite intersections Bs{a) 
(resp. Bsiot')) are contractible, the map qa '■ AL" — > F" (resp. Qa' '■ AL° — > F" ) induced by 
sending each open set Bs{a) (resp. Bs{a')) to a point is a homotopy equivalence [HU Prop. 4G.1 
and Corol. 4G.3]. Furthermore, by construction, qa is the restriction of Qa' to AL°. Therefore, 

AL" ^ AL°' 

qa i i Qa' (4) 

pa ^ ^ pa' 

is a commutative diagram where vertical arrows are homotopy equivalences. Now, it is well-known 
that F" and F" are homeomorphic to C"(L) and (L) respectively, and that the homeomorphisms 
commute with the inclusion. Combined with ([3]) and ([4]), this fact proves Lemma 13.41 □ 

Combining Lemmas 13.31 and 13. 4^ we obtain the following key result: 

Theorem 3.5 Let X be a compact set and L a finite set in R'^, such that d7^(X, L) < e for 
some s < J wfs(X). Then, for all a, a' G [e,wfs(X) — e] such that a' — a > 2e, and for all 
A G (0,wfs(X)), we have: \/k G N, Hk{X^) = im j^, where : Hk{C"{L)) i7fc(C"'(L)) is the 
homomorphism between homology groups induced by the canonical inclusion j : C"{L) ^ (L). 
Given an arbitrary point xq G X , the same result holds for homotopy groups with base-point xq. 

Using the terminology of [38], this result means that the homology of X^ can be deduced from the 
persistent homology of the filtration {C°'{L)}a>o by removing the cycles of persistence less than 
2s. Equivalently, the amplitude of the topological noise in the persistence barcode of {C'^{L)}a>o 
is bounded by 2e, i.e. the intervals of length at least 2e in the barcode give the homology of X'*'. 

3.1.2 Filtrations intertwined with the Cech filtration 

Using Lemma |2. II and Theorem 13.51 we get the following guarantees on the Rips filtration: 

Theorem 3.6 Let X C be a compact set, and L C a finite set such that d')-i{X,L) < e 
for some e < | wfs{X). Then, for all a G [2e, | (wfs(X) — e)] and all X G (0,wfs(X)), we have: 
V/c G N, Hk{X^) = im j^, where j^, : Hk{Tl°'{L)) — > Hk{TZ^'^ (L)) is the homomorphism between 
homology groups induced by the canonical inclusion j : 7^"(L) ^ 7^^"(L). 
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Proof. Prom Lemma |2. II we deduce the following sequence of inclusions: 

Ct (L) 7^"(L) ^ C"(L) ^ C'^'^iL) ^ 7^^"(L) ^ C^'^iL) (5) 

Since a > 2e, Theorem 13.51 implies that Eq. ([5]) induces a sequence of homomorphisms between 
homology groups, such that iffc(Ct(L)) ^ Hk{C^"'{L)) and Hk{C"'{L)) Hk{C^°'{L)) have ranks 
equal to dimfffc(X^). Therefore, by the Sandwich Lemma [3.21 rank is also equal to dimi7fc(X^). 
It follows that im j; = dimi7fc(^^), since our ring of coefficients is a field. □ 

Similarly, Corollary 12.31 provides the following sequence of inclusions: 

from which follows a result similar to Theorem 13.61 on the witness complex, by the same proof: 

Theorem 3.7 Let X be a compact subset of'K'^, and Zet L C C M'^ be such that L is finite. As- 
sume that d'H{X,W) < 6 and that d-^(W,L) < e, with 6 < e < min {| diamcc(-'^)) nW wfs(X)}. 
Then, for all a G [4e, (wfs(X) - e)] and all X e (0, wfs(X)), we have: Vfc G N, Hk{X^) = im j^,, 
where : Hk{C^^{L)) — > Hk{C'^°'{L)) is the homomorphism between homology groups induced by 
the canonical inclusion j : Cy^{L) ^ Cy^"(L). 

More generally, the above arguments show that the homology of X''^ can be recovered from the 
persistence barcode of any filtration {Fa}a>o that is intertwined with the Cech filtration in the 
sense of Lemmas 12. II and 12. 21 Note however that Theorems 13.61 and 13.71 suggest a different behavior 
of the barcode in this case, since its topological noise might scale up with a (specifically, it might be 
up to linear in a), whereas it is uniformly bounded by a constant in the case of the Cech filtration. 
This difference of behavior is easily explained by the way {-Fa}a>o is intertwined with the Cech 
filtration. A trick to get a uniformly-bounded noise is to represent the barcode of {Fa}a>o on a 
logarithmic scale, that is, with log2 a instead of a in abcissa. 

3.2 Results on homotopy 

The results on homology obtained in Section 13.11 follow from simple algebraic arguments. Using 
a more geometric approach, we can get similar results on homotopy. From now on, xq £ X is a 
fixed point and all the homotopy groups TTk{X) = TTk{X,XQ) are assumed to be with base-point xq. 
Theorems 13.61 and 13.71 can be extended to homotopy in the following way: 

Theorem 3.8 Under the same hypotheses as in Theorem \3.6l we have: Vfc S N, tti.{X^) = im j*, 
where j* : 7rfc(7^'^(L)) — > 7rfc(7^^"(L)) is the homomorphism between homotopy groups induced by 
the canonical inclusion j : TZ'^{L) ^ 7?.^"(L). 

Theorem 3.9 Under the same hypotheses as in Theorem \3. ?t we have: \/k G N, TTk{X'^) = im j*, 
where j^, : ^^k{C^r{L)) 7rfc(Cjy°(L)) is the homomorphism between homotopy groups induced by 
the canonical inclusion j : Cy^r{L) Cjy"(L). 

The proofs of these two results being mostly identical, we focus exclusively on the Rips complex. 
We will use the following lemma, which is an immediate generalization of Proposition 4.1 of |12j : 

Lemma 3.10 Let X be a compact set and L a finite set in M*^, such that d-niXjL) < e for some 
£ < \ wfs(X). Let a, a' G [e,wis{X) — e] be such that a' — a > 2e. Given k £ N, two k-loops 
(Ti, (72 : S'^ — > (L", xq) in L° are homotopic in X"''~^^ if and only if they are homotopic in L"'. 
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Proof of Theorem 13.81 As mentionned at the begining of the proof of Lemma 13.31 we can 
assume without loss of generahty that 2e < a < |(wfs(X) — e). Consider the following sequence of 
inclusions: 

C 7^°(L) C C"(L) C C2°(L) C 7^'^°(L) C C^"(L) 

We use the homotopy equivalences hp : L^^ ^ C^{L) provided by Lemma [33] for all values /3 > 0, 
which commute with inclusions at homotopy level. Note that, for any element a of 7rfc(C^(L)), 
there exists a /c-loop in that is mapped through /i^ to a fc-loop representing the homotopy class 
a. In the following, we denote by ag such a /c-loop. Let E,F and G be the images of 7rfc(C"2(L)) 
in 7rfc(C"(L)), 7rfc(C^"(L)) and 7rfc(C^"(L)) respectively, through the homomorphisms induced by 
inclusion. We thus have a sequence of surjective homomorphisms: 

7Tk{C'^{L))^E^F^G 

Note that, by Theorem 13.51 F and G are isomorphic to Tr^iX^). Let a £ F he a, homotopy 
class. Since F is the image of ^^{0^ (L)), we can assume without loss of generality that ag C . 
Assume that the image of cr in G is equal to 0. This means that ag is null-homotopic in 
and, since C X*°'+^, ag is also null-homotopic in But cjg C Lf C and X'^°'+^ 

deformation retracts onto X 2"+^, by the Isotopy Lemma [3. 11 As a consequence, ag is null-homotopic 
in Xa"*"^, which is contained in L^" since ^ + 2e < 2a. Hence, ag is null-homotopic in L^", namely: 
a = in F. So, the homomorphism F ^ G is injective, and thus it is an isomorphism. As 
a consequence, F — > 7rfc(7^^"(L)) is injective, and it is now sufficient to prove that the image of 
: 7r,t(7^"(L)) — > 7rfc(C^"(L)) induced by the inclusion is equal to F. 

Obviously, F is contained in the image of (/>*. Now, let a £ ■Kk{TZ"{L)) and let 4>^,{a)g be a 
fc-loop in that is mapped through h2a to a fc-loop representing the homotopy class (/>*(cr). Since 
(j)if{a) is in the image of i;^>*, and since TZ°'{L) C C°'{L), we can assume that (j)*{a)g is contained in 
L". Let cjg be the image of 4'*{a)g through a deformation retraction of X^""*"^ onto X"", where 
< ao < f is such that ^ — ao> e. Obviously, dg and (j)*{a)g are homotopic in X^""^"^. It follows 
then from Lemma 13.101 that ag and (f)^:{a)g are homotopic in L^". And since ag is contained in 
X"" C La", the equivalence class of h^(^ag) in 'Kk{C^{L)) is mapped to 0*((t) G 7rfc(C^"(L)) through 

the homomorphism induced by C 2" (L) ^ C^°(L), which commutes with the homotopy equivalences. 
As a result, (^=,,(0") belongs to F, which is therefore equal to vav (f)^,. □ 



4 The case of smooth submanifolds of R 

In this section, we consider the case of submanifolds X of M"^ that have positive reach. Recall that 
the reach of X, or rch(X) for short, is the minimum distance between the points of X and the 
points of its medial axis [1]. A point cloud L C X is an e-sample of X if every point of X lies 
within distance e of L. In addition, L is e-sparse if its points lie at least e away from one another. 

Our main result is a first attempt at quantifying a conjecture of Carlsson and de Silva |18] . 
according to which the witness complex filtration should have cleaner persistence barcodes than 
the Cech and Rips filtrations, at least on smooth submanifolds of M'^. By cleaner is meant that 
the amplitude of the topological noise in the barcodes should be smaller, and also that the long 
intervals should appear earlier. We prove this latter statement correct, at least to some extent: 

Theorem 4.1 There exist a constant q > ^ and a continuous, non- decreasing map uj : [0, — > 
[0, such that, for any submanifold X ofW^, for all e, 6 satisfying < 6 < e < g rch(X), for any 6- 
sample W of X and any e-sparse e-sample L ofW, Cy^r{L) contains a subcomplexV homeomorphic 
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to X and such that the canonical inclusion T) ^ C^r{L) induces an injective homomorhism between 
homology groups, provided that a satisfies: + '^( i-ch^x) )^^) — rch(X) — (3 + ■^)(e + (5). 

This theorem guarantees that, for values of a ranging from 0{5 + '^{ ^^^x) ^° r2(rch(X)), the 
topology of X is captured by a subcomplex T) that injects itself suitably in Cy^r{L). As a result, 
long intervals showing the homology of X appear around a = 0{5 + '^( rch^x) )^^) ™ ^'^^ persistence 
barcode of the witness complex filtration. This can be much sooner than the time a = 2e prescribed 
by Theorem 13.71 since '^(jg^pq) arbitrarily small. Specifically, the denser the landmark set 

L, the smaller the ratio j^^^^, and therefore the smaller 5 + ^{ ^^^x) compared to 2e. We have 
reasons to believe that this upper bound on the appearance time of long bars is tight. In particular, 
the bound cannot depend solely on 5, since otherwise, in the limit case where (5 = 0, we would 
get that the homology groups of X can be injected into the ones of the standard witness complex 
Cw{L), which is known to be false |30^ I35|. The same argument implies that the amplitude of the 
topological noise in the barcode cannot depend solely on 5 either. However, whether the upper 
bound 0(e) on the amplitude of the noise can be improved or not is still an open question. 

Our proof of Theorem 14.11 generalizes and argument used in [26] for the planar case, which 
stresses the close relationship that exists between the a- witness complex and the so-called weighted 
restricted Delaunay triangulation T)-^{L). Given a submanifold X of M'^, a finite landmark set 
L C W^, and an assignment of non-negative weights to the landmarks, specified through a map 
u; : L — > [0, oo), T>^{L) is the nerve of the restriction to X of the power diagran^ of the weighted 
set L. Under the hypotheses of the theorem, we show that C^^{L) contains T)^{L), which, by a 
result of Cheng et al. [T^ (see Theorem 14.21 below), is homeomorphic to X. The main point of the 
proof is then to show that 'D^{L) injects itself nicely into Cy^{L). 

The rest of the section is devoted to the proof of Theorem 14.11 After introducing the weighted 
restricted Delaunay triangulation in Section 14.11 and stressing its relationship with the a- witness 
complex in Section 14.21 we detail the proof of Theorem 14.11 in Section 14.31 

4.1 The weighted restricted Delaunay triangulation 

Given a finite point set L C M'^, an assignment of weights over L is a non-negative real-valued 
function : L — > [0,oo). The quantity max^gj;^ „g2.\{M} n^^^jy is called the relative amplitude of oj. 
Given p G W^, the weighted distance from p to some weighted point v ^ L is \\p — v\\'^ — lo{v)'^ . This 
is actually not a metric, since it is not symmetric. Given a finite point set L and an assignment 
of weights uj over L, we denote by Vuj{L) the power diagram of the weighted set L, and by T>^{L) 
its nerve, also known as the weighted Delaunay triangulation. If the relative amplitude of uj is at 
most ^, then the points of L have non-empty cells in V(^(L), and in fact each point of L belongs to 
its own cell [13]. For any simplex a of 'D^{L), Vuj{cr) denotes the face of Vi^{L) dual to a. 

Given a subset X of M'^, we call V^{L) the restriction of V(^(L) to X, and we denote by P^(L) 
its nerve, also known as the weighted Delaunay triangulation of L restricted to X. Observe that 
V^{L) is a subcomplex of Vi^{L). In the special case where all the weights are equal, VuiL) and 
'D^{L) coincide with their standard Euclidean versions, V(L) and T>{L). Similarly, \uj{a) becomes 
V(cr), and Vj(L) and V^{L) become respectively {L) and V^{L). 

Theorem 4.2 (Lemmas 13, 14, 18 of |14j . see also Theorem 2.5 of [4]) There exis^a con- 
stant Q> Q and a non- decreasing continuous map Cd : [0, — > [0,^), such that, for any manifold 

^More on power diagrams and on restricted triangulations can be found in and [23) respectively. 
^Note that g and uj are the same as in Theorem 14. II In fact, these quantities come from Theorem 14.21 
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X and any e-sparse 2e-sample L of X, with e < g rch(X), there is an assignment of weights lj of 
relative amplitude at most lo I ^^i^x) ) such that T)-^{L) is homeomorphic to X . 



This theorem guarantees that the topology of X is captured by (L) provided that the landmarks 
are sufficiently densely sampled on X, and that they are assigned suitable weights. Observe that 
the denser the landmark set, the smaller the weights are required to be, as specified by the map 
d). In the particular case where X is a curve or a surface, tu can be taken to be the constant zero 
map, since T)^{L) is homeomorphic to X [Tl[2]. On higher-dimensional manifolds though, positive 
weights are required, since V-^{L) may fail to capture the topological invariants of X [35] . 

The proof of the theorem given in [H] shows that V^{L) satisfies the so-called closed ball 
property, which states that every face of the weighted Voronoi diagram Vuj{L) intersects the manifold 
X along a topological ball of proper dimension, if at all. Under this condition, there exists a 
homeomorphism Hq between the nerve 'D^{L) and X, as proved by Edelsbrunner and Shah |23j . 
Furthermore, Hq sends every simplex of T>^{L) to a subset of the union of the restricted Voronoi 
cells of its vertices, that is: Vo" £ V^{L), ho{a) C vertex of o- ^'^(^) ^ This fact will be 
instrumental in the proof of Theorem 14.11 

4.2 Relationship between V^{L) and C^{L) 

As mentioned in introduction, the use of the witness complex filtration for topological data analysis 
is motivated by its close relationship with the weighted restricted Delaunay triangulation: 

Lemma 4.3 Let X he a compact subset ofW^, W C X a 6-sample of X, and L C W an e-sparse 
e-sample of W . Then, for all assignment of weights uj of relative amplitude w < ^, T>^{L) is 
included in C^/{L) whenever a > i,^ + <^^e) . 

This result implies in particular that D-^iV) is included in Cy^(V) whenever a > 2 J, since D^iV) 
is nothing but V^{L) for an assignment of weights of relative amplitude zero. 

Proof. Let <t be a simplex of T)^(L). If cr is a vertex, then it clearly belongs to Cy^{L) 
for all a > 0, since L C W . Assume now that a has positive dimension, and consider a point 
c G Va;(cr) n X. For any vertex v oi a and any point p of L (possibly equal to v), we have: 
\\v — c|p — uj{vY < \\p — c|p — oj{pY, which yields: — c|p < \\p — c|p + uj{vY — ^{pY- Now, a;(p)^ 
is non-negative, while uj{v)'^ is at most u)'^\\V — which gives: \\V — c|p < \\p — c|p -t-w^H?; — 
Replacing ||f — p|| by H-y— c||-|-||p— c||, we get a semi-algebraic expression of degree 2 in H-y— c||, namely: 
{1 - u}^)\\v - c\\^ - 2u}^\\p - c\\\\v - c\\ - (1-Fa)2)||p- c|p < 0. It follows that \\v - c\\ < \\p-c\\. 
Let now w he a point of W closest to c in the Euclidean metric. Using the triangle inequality and 
the fact that \\w — c\\ < 6, we get: \\v — w\\ < \\v — c\\ + \\w — c\\ < \\P ~ c|| + (5. This holds 

for any point p £ L, and in particular for the nearest neighbor p^ of w in L. Therefore, we have 
\\v-w\\ < jz^ \\Pw-c\\+6, which is at most {\\p^-w\\+6) + 6 < \\Pw-w\\ + j^ {6 + Ld'^e) 
because — c|| < 5 and — p^|| < e. Since this inequality holds for any vertex v a, and since 
the Euclidean distances from w to all the landmarks are at least \\pw — w\\, w is an a-witness of a 
and of all its faces as soon as a > + ^'^^)- Since this holds for every simplex a of 'D^{L), 

the lemma follows. □ 

4.3 Proof of Theorem HH] 

The proof is mostly algebraic, but it relies on two technical results. The first one is Dugundji's 
extension theorem [20], which states that, given an abstract simplex a and a continuous map 
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/ : da W^, f can be extended to a continuous map f : a ^ such that f{cr) is included in 
the Euchdean convex hull of f{da), noted CH(/(9(j)). This convexity property of / is used in the 
proof of the second technical result, stated as Lemma 14.51 and proved at the end of the section. 

Proof of Theorem 14.11 Since 6 < e, L is an e-sparse 2e-sample of X, with e < g rch(X). 
Therefore, by Theorem 14.21 there exists an assignment of weights uj over L, of relative amplitude 
at most (D ^ rch(X) ) ' such that (L) is homeomorphic to X. Taking V = D^(L), we then have: 
yk G N, Hk{X) = Hk{T>). Moreover, by Lemma 14.31 we know that T> = T>^{L) is included in 

C-wi^)^ since a > | {^{^^^qt)) ^ + ^] > — ('^ (jShpry) '^ + '^)- There remains to 

show that the inclusion map j : T>^{L) ^ C^(L) induces injective homomorphisms between the 
homology groups of V^{L) and Cy^{L), which will conclude the proof of the theorem. 

Our approach to showing the injectivity of consists in building a continuous ma fI h : 
Cy^{L) — > P^(L) such that h o j is homotopic to the identity in T)^{L). This implies that 
o : Hk{'D^ (L)) —>■ Hk{'D^ (L)) is an isomorphism (in fact, it is the identity map), and thus 
that is injective. 

We begin our construction with the homeomorphism /iq : 'D^(L) — > X provided by the theorem 
of Edelsbrunner and Shah [23]. Taking ho as a map T)^{L) R*^, we extend it to a continuous 
map fiQ : Cy^{L) — > by the following iterative process: while there exists a simplex a G C^^{L) 
such that ho is defined over the boundary of a but not over its interior, apply Dugundji's extension 
theorem, which extends /iq to the entire simplex a. 

Lemma 4.4 The above iterative process extends ho to a map Jiq : C^^{L) — > M*^. 

Proof. We only need to prove that the process visits every simplex of C{^(L). Assume for 
a contradiction that the process terminates while there still remain some unvisited simplices of 
C^^{L). Consider one such simplex a of minimal dimension. Either u is a vertex, or there is at 
least one proper face of a that has not yet been visited - since otherwise the process could visit a. 
In the former case, <t is a point of L, and as such it is a verte^lll of T>^{L), which means that ho is 
already defined over a (contradiction). In the latter case, we get a contradiction with the fact that 
a is of minimal dimension. □ 

Now that we have built a map ho : Cj^(L) — > W^, our next step is to turn it into a map 
C^r{L) — > X. To do so, we compose it with the projection px that maps every point of M'^ to its 
nearest neighbor on X, if the latter is unique. This projection is known to be well-defined and 
continuous over M*^ \ M, where M denotes the medial axis of X [24]. 

Lemma 4.5 Let X, W, L, 6, e satisfy the hypotheses of Theorem \4.1\ Then, /io(C^(L)) n M = as 
long as a <l rch(X) - + ^) {e + 6). 

Since by Lemma 1^31 we have ho{C^^{L)) n M = 0, the map px o '■ C^/{L) — > X is well-defined and 
continuous. Our final step is to compose it with /iq^, to get a continuous map h = h^^ o px ° '■ 
C^r(L) — > P(J(L). The restriction of h to (L) is simply h^^ ° Px ° ho, which coincides with 
/iq ^ o /iQ = id since ho{T>'^{L)) = X. It follows that ho j is homotopic to the identity in P^(L) 
(in fact, it is the identity), and therefore that the induced map /i* o is the identity. This implies 
that j* : Hk{V-^ [L)) — > Hk[Cy^{L)) is injective, which concludes the proof of Theorem 14. 1[ □ 

^Note that this map does not need to be simpUcial, since we are using singular homology. 

^Indeed, every point p £ L lies on X and belongs to its own cell, since lo has relative amplitude less than i. 
Therefore, Vt^(p) n X ^ 0, which means that p is a vertex of (L). 
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We end the section by providing the proof of Lemma 14. 5t 

Proof of Lemma 14.51 First, we claim that the image through /iq of any simplex of C^r{L) 
is included in the Euclidean convex hull of the restricted Voronoi cells of its simplices, that is: 
VcT e C^{L), ho{a) C CH(U, vertex of a ^..(t') n X). This is dearly true if a belongs to V^{L), 
since in this case we have ho{a) = ho{a) C vertex of o- ^'^('^) ^ -^^ ^ mentioned after Theorem 
I4.2[ Now, if the property holds for all the proper faces of a simplex a S C^,{L), then by in- 
duction it also holds for the simplex itself. Indeed, for each proper face r C o", we have /ioIt) C 
CH (U. vertex of . V^t;) H X) C CH (U„ vertex of a ^Uv) H X) . Therefore, CH^(U, vertex of . V^(z;) n X) 
contains C}i (jio{da)^ , which, by Dugundji's extension theorem, contains ho{a). Therefore, the 
property holds for every simplex of C^r{L). 

We can now prove that the image through /iq of any arbitrary simplex a of C^r[L) does not 
intersect the medial axis of X. This is clearly true if o" is a simplex of I?^(L), since in this case 
ho{a) = /io(c) is included in X. Assume now that a ^ In particular, a is not a vertex. Let 

V be an arbirtary vertex of a. Consider any other vertex u of a. Edge [u, v] is a-witnessed by some 
point Wuv S W. We then have Ht^— < 1^— 'u^mdH + H'"^™— ""H < 2d2(tyni))+2a, where d2(w™) stands 
for the Euclidean distance from Wuv to its second nearest landmark. According to Lemma 3.4 of [1], 
we have d2{w) < 3{e+5), since L is an (e-|-(5)-sample of X. Thus, all the vertices of a are included in 
the Euclidean ball B{v, 2a + Q{e + S)). Moreover, for any vertex u of a and any point p G Vtj(M)nX, 
we have < e+5, where u' is a landmark closest to p in the Euclidean metric. Combined with 

the fact that \\p — u\\'^ — u;{u)'^ < \\p — u'W^ — u;{u')'^ , we get: — < \\p — u'W^ + u;{u)'^ < 2{e + 6)'^, 

since by Lemma 3.3 of [Ij we have uj{u) < 2 ui ^ y-ch(^x) ) + < £ + f^- Hence, Vcj(u) n X is 
included in B{u, V2{e + -5)) C B{v, 2a + (6 + V2){£ + 5)). Since this is true for every vertex u of 
a, we get: ho{a) C CH (U„ vertex of a n X) C B{v, 2a + (6 + V2){e + 6)). Now, v belongs to 

L <^ W <^ X , and by assumption we have 2a + (6 + \/2)(e + 5) < rch(X), therefore ho{a) does not 
intersect the medial axis of X. □ 



5 Application to reconstruction 

Taking advantage of the structural results of Section [3l we devise a very simple yet provably-good 
algorithm for constructing nested pairs of complexes that can capture the homology of a large 
class of compact subsets of M*^. This algorithm is a variant of the greedy refinement technique of 
|30| , which builds a set L of landmarks iteratively, and in the meantime maintains a suitable data 
structure. In our case, the data structure is composed of a nested pair of simplicial complexes, which 
can be either 7^"(L) ^ (L) or C^iLi) C^^{L), for specific values a < a'. Both variants of 
the algorithm can be used in arbitrary metric spaces, with similar theoretical guarantees, although 
the variant using witness complexes is likely to be more effective in practice. In the sequel we focus 
on the variant using Rips complexes because its analysis is somewhat simpler. 

5.1 The algorithm 

The input is a finite point set W drawn from an arbitrary metric space, together with the pairwise 
distances l{w,w') between the points of W. In the sequel, W is identified as the set of witnesses. 

Initially, L = and e = -|-c<3. At each iteration, the point of W lying furthest awa}(l from L 
in the metric I is inserted in L, and e is set to max^g^y ™iii?;eL l{w,v). Then, TZ'^^{L) and Tl^^^{L) 
are updated, and the persistent homology of 7^^'^(L) > Tl^^'^{L) is computed using the persistence 

^At the first iteration, since L is empty, an arbitrary point of W is cliosen. 
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algorithm [38]. The algorithm terminates when L = W. The output is the diagram showing the 
evolution of the persistent Betti numbers versus e, which have been maintained throughout the 
process. As we will see in Section 15.21 below, with the help of this diagram the user can determine 
a relevant scale at which to process the data: it is then easy to generate the corresponding subset 
L of landmarks (the points of W have been sorted according to their order of insertion in L during 
the process), and to rebuild Tl^^{L) and TZ^^'^{L). The pseudo-code of the algorithm is given in 
Figure [2l 

Input: W finite, together with distances l{w,w') for all w,w' G W. 
Init: Let L := 0, e := +oo; 
While Lew do 

Let p := argmax^gp^/ min^igj;, u); // p chosen arbitrarily in W if L = 9 
L:=LUM; 

e := maxwew niin^,gi l{w, v); 
Update 7^^^(L) and 7^l*^^(L); 

Compute persistent homology of TZ'^'^{L) ^ 7?.^^^(L); 
End_while 

Output: diagram showing the evolution of persistent Betti numbers versus e. 
Figure 2: Pseudo-code of the algorithm. 

5.2 Guarantees on the output 

For any i > 0, let L{i) and e{i) denote respectively L and e at the end of the ith iteration of 
the main loop of the algorithm. Since L{i) keeps growing with i, is a decreasing function of 
i. In addition, L{i) is an e(i)-sample of W , by definition of £{i). Hence, if is a (5-sample of 
some compact set X C W^, then L{i) is a, {6 + e(i))-sample of X. This quantity is less than 2e(z) 
whenever e{i) > 6. Therefore, Theorem 13.61 provides us with the following theoretical guarantee: 

Theorem 5.1 Assume that the input point set W is a 6-sample of some compact set X C W^, 
with 6 < ^wfs{X). Then, at each iteration i such that S < e{i) < ^wfs(X), the persistent 
homology groups of TZ^^^^\L{i)) ^ TZ^^^^^\L{i)) are isomorphic to the homology groups of X^, for 
allXe (0,wfs(X)). 

This theorem ensures that, when the input point cloud W is sufficiently densely sampled from a 
compact set X, there exists a range of values of £{i) such that the persistent Betti numbers of 
7^4^W(L(^)) ^ n^^<'\L{i)) coincide with the ones of sufficiently small offsets X"^. This means 
that a plateau appears in the diagram of persistent Betti numbers, showing the Betti numbers of 
X^. In view of Theorem 15. H the width of the plateau is at least ^wfs(X) — 6. The theorem also 
tells where the plateau is located in the diagram, but in practice this does not help since neither 
6 nor wfs(X) are known. However, when 6 is small enough compared to wfs(X), the plateau is 
large enough to be detected (and thus the homology of small offsets of X inferred) by the user or 
a software agent. In cases where W samples several compact sets with different weak feature sizes. 
Theorem 15.11 ensures that several plateaus appear in the diagram, showing plausible reconstructions 
at various scales - see Figure [1] (right). These guarantees are similar to the ones provided with the 
low-dimensional version of the algorithm [30] . 

Once one or more plateaus have been detected, the user can choose a relevant scale at which to 
process the data: as mentioned in Section 15.11 above, it is then easy to generate the corresponding 
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set of landmarks and to rebuild TZ'^'^{L) and TZ^^^{L). Differently from the algorithm of [30], the 
outcome is not a single embedded simplicial complex, but a nested pair of abstract complexes whose 
images in lie at Hausdorff distancqj 0{e) of X, such that the persistent homology of the nested 
pair coincides with the homology of X^ . 

5.3 Update of 7^^^(L) and n^^'{L) 

We will now describe how to maintain TZ'^^{L) and TZ^^^{L). In fact, we will settle for describing 
how to rebuild TZ^^^{L) completely at each iteration, which is sufficient for achieving our complexity 
bounds. In practice, it would be much preferable to use more local rules to update the simplicial 
complexes, in order to avoid a complete rebuilding at each iteration. 

Consider the one-skeleton graph G of TZ^^^{L). The vertices of G are the points of L, and its 
edges are the sets {p,q} C L such that \\p — q\\ < 16e. Now, by definition, a simplex that is not 
a vertex belongs to TZ^^^{L) if and only if all its edges are in 7^^^^(L). Therefore, the simplices of 
TZ^^^{L) are precisely the cliques of G. The simplicial complex can then be built as follows: 

1. build graph G, 

2. find all maximal cliques in G, 

3. report the maximal cliques and all their subcliques. 

Step 1. is performed within 0(|Lp) time by checking the distances between all pairs of landmarks. 
Here, |G| denotes the size of G and \L\ the size of L. To perform Step 2., we use the output-sensitive 
algorithm of [37], which finds all the maximal cliques of G in 0{k \L\''^) time, where k is the size 
of the answer. Finally, reporting all the subcliques of the maximal cliques is done in time linear in 
the total number of cliques, which is also the size of TZ^^^{L). Therefore, 

Corollary 5.2 At each iteration of the algorithm, TZ'^^{L) andTZ^^'^{L) are rebuilt within 0{\TI^^^ {L)\ 
time, where \Tl^^^{L)\ is the size of7i}^^{L) and \L\ the size of L. 

5.4 Running time of the algorithm 

Let |L|, |7^-^^'^(L)| denote the sizes of W,L,'R^^'^{L) respectively. At each iteration, point p 
and parameter e are computed naively by iterating over the witnesses, and for each witness, by 
reviewing its distances to all the landmarks. This procedure takes 0(|VK||L|) time. According to 
Corollary [521 Ti'^^iL) and 7^^'5^(L) are updated (in fact, rebuilt) in 0{\n^^^ {L)\\L\^) time. Finally, 
the persistence algorithm runs in 0(|7^^^^(L)|^) time [221 138] . Hence, 

Lemma 5.3 The running time of one iteration of the algorithm is 0{\W\\L\ + \'R}^^ {L)\\Lf' + 

There remains to find a reasonable bound on the size of Tl^^^{L), which can be done in Euclidean 
space M'^, especially when the landmarks lie on a smooth submanifold: 

Lemma 5.4 Let L be a finite e-sparse point set inM.'^. Then, Ti}^^{L) has at most 2^^''|L| simplices. 
If in addition the points of L lie on a smooth m-submanifold X of with reach rch(X) > 16e, 
then Ii}^'^{L) has at most 2^^"^\L\ simplices. 

Proof. Given an arbitrary point v G L, we will show that the number of vertices in the star 
of V in 7^^^^(L) is at most 33*^. From this follows that the number of simplices in the star of v 
is bounded by 2^'^'' , which proves the first part of the lemma. Let A be the set of vertices in the 

^Indeed, every simplex of TZ^'^^ (L) has all its vertices in X^'^^ C , and the lengths of its edges are at most 16e. 
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star of V. These vertices lie within Euclidean distance 16e of v, and at least e away from one 
another. It follows that they are centers of pairwise-disjoint Euclidean d-balls of same radius |, 
included in the (i-ball of center v and radius (16 + ^)e. Therefore, their number is bounded by 

volB(^,(16+i/2)e) _ /l6+i/2\'^ _ oqd 

voib(d,V2) ~ \^ 1/2 ; ~ ■ 

Assume now that v and the points of A lie on a smooth m-submanifold X of M*^, such that 

II ||2 

16e < rch(X). It follows then from Lemma 6 of [28] that, for all u € A, we have — u'|| < 2reh(jy) — 
2rch^(x) ^ where u' is the orthogonal projection of u onto the tangent space of X at f , T{v). As 
a consequence, the orthogonal projections of the points of A onto T{v) lie at least ^ away from 
one another, and still at most 16e away from v. As a result, they are centers of pairwise-disjoint 
open m-balls of same radius included in the open m-ball of center v and radius (l6 -|- e 

3^^^ j < 35™, which proves the second 

part of the lemma, by the same argument as above. □ 

In cases where the input point cloud W lies on a smooth m-submanifold X of M'^, the above 
resul10 suggests that the course of the algorithm goes through two phases: first, a transition phase, 
in which the landmark set L is too coarse for the dimensionality of X to have an influence on 
the shapes and sizes of the stars of the vertices of 7^^^^(L); second, a stable phase, in which the 
landmark set is dense enough for the dimensionality of X to play a role. This fact is quite intuitive: 
imagine X to be a simple closed curve, embedded in M*^ in such a way that it roughly fills in the 
space within the unit d-ball. Then, for large values of e, the landmark set L is nothing but a 
sampling of the ti-ball, and therefore the stars of its points in 'R}^^{L) are d-dimensional. 

Let io be the last iteration of the transition phase, i.e. the last iteration such that e(io) > 
rch(X). Then, Lemmas 15.31 and 15.41 imply that the time complexity of the transition phase is 
0(|VF||L(zo)P + 833'|L(zo)|5), while the one of of the stable phase is 0(835™ \W\^). We can get rid 
of the terms depending on d in at least two ways: 

• The first approach has a rather theoretical flavor: it consists in amortizing the cost of the 
transition phase by assuming that W is sufficiently large. Specifically, since -L(zo) is an e(io)-sparse 
sample of X, with e(io) > rch(Ar), the size of -L(io) is bounded from above by some quantity 
Co (A") that depends solely on the (smooth) manifold X - see e.g. [S] for a proof in the special 
case of smooth surfaces. As a result, we have 8^^'*|L(io)|'^ < S^^^IM^I*^ for all /c > 1 whenever 
\W\ > 833-35"' co{X). This condition on the size of W translates into a condition on 6, by a 
similar argument to the one invoked above. 

• The second approach has a more algorithmic flavor, and it is based on a backtracking strategy. 
Specifically, we first run the algorithm without maintaining TZ'^^{L) and 7^^^^(L), which simply sorts 
the points of W according to their order of insertion in L. Then, we run the algorithm backwards, 
starting with L = L{\W\) = W and considering at each iteration j the landmark set L(|VF| — j). 
During this second phase, we do maintain TZ'^^(L) and TZ^^''{L) and compute their persistent Betti 
numbers. If W samples X densely enough, then Theorem 15.11 ensures that the relevant plateaus 
will be computed before the transition phase starts, and thus before the size of the data structure 
becomes independent of the dimension of X. It is then up to the user to stop the process when the 
space complexity becomes too large. 

In both cases, we get the following complexity bounds: 



^Note that, at every iteration i of the algorithm, L{i) is an £(i)-sparse point set, since the algorithm always inserts 
in L the point of W lying furthest away from L — see e.g. [301 Lemma 4.1]. 
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Theorem 5.5 If W is a point cloud in Euclidean space M'^, then the running time of the algorithm 
is 0(8^^ 1^1^); where \W\ denotes the size ofW. If in addition W is a 6-sample of some smooth 
m-submanifold o/M'^, with 6 small enough, then the running time becomes 0(8^^™ 1^1^)- 

6 Conclusion 

This paper makes effective the approach developped in [12\ [T5] by providing an efficient, prov- 
ably good and easy-to-implement algorithm for topological estimation of general shapes in any 
dimensions. Our theoretical framework can also be used for the analysis of other persistence-based 
methods. Addressing a weaker version of the classical reconstruction problem, we introduce an 
algorithm that ultimately outputs a nested pair of complexes at a user-defined scale, from which 
the homology of the underlying shape X are inferred. When X is a smooth submanifold of M*^, the 
complexity scales up with the intrinsic dimension of X. These results provide a new step towards 
reconstructing (low-dimensional) manifolds in high-dimensional spaces in reasonnable time with 
topological guarantees. It is now tempting to tackle the more challenging problem of constructing 
an embedded simplicial complex that is topologically and geometrically close to the sampled shape. 
As a first step, we intend to adapt our method to provide a single output complex that has the 
same homology as X, using for instance the sealing technique of |25j . 
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